******************************************************************************************************************
Replication files for "Territorial Autonomy and the Trade-Off between Civil and Communal Violence" (Andreas Juon).
******************************************************************************************************************

This repository contains the required data and code to replicate the findings in "Territorial Autonomy and the Trade-Off between Civil and Communal Violence" (Andreas Juon, APSR) (see points 1-3 below). It also contains all data and code to transform original and external data into the dataframes and main dependent and independent variables used throughout the article's analyses (see points 4-6 below).

IMPORTANT: Before starting with the below steps, please first unpack the following .rar archives. These have been packaged in .rar-format to prevent automatic changes to the structure of folders containing shapefiles by Dataverse:
- "external_data/CShapes_augmented.rar"
- "external_data/geoEPR_augmented.rar"
- "external_data/PRIO_Grid.rar"
- "original_data/administrative_boundaries/sau1.0_fullres.rar"
- "original_data/administrative_boundaries/sau1.0_simplified_clean.rar"
For the scripts to run, the .rar-files' contents should be contained in a folder of the same name in the same location as the original .rar-files. This should be the default result of most software that unpackages .rar-files.


*********************
1. REPLICATION SCRIPT
*********************

All analyses except for instrumental variables analyses are conducted in R (2021.09.1 Build 372). Instrumental variables analyses are conducted in Stata (SE 16.1).

All figures and tables in the main article text and in the online appendix, with the exception of the instrumental variables analyses, can be compiled by running the master file "replication_script/___replication_master_file.R". This file will call the individual analysis scripts located in the same folder. Make sure to install all R libraries called at the beginning of "replication/_ReplicationMasterFile.R" on your system. To run the instrumental analyses using Stata, open and run file "replication_script/appendix4a.do".

For details and/or partial replication, open and run the respective source scripts. They run independently, provided (for the R scripts) that you have loaded all libraries and the respective analysis datasets specified in "replication_script/___replication_master_file.R". NOTE: Replication files "group_grid_year.csv" and "dyad_grid_year.csv", required to run analyses at the group/dyad-grid cell year, are zipped, due to Dataverse's file size limits. These need to be unzipped before running analyses at these levels.

All output is individually stored as .pdf figures in the subfolder "figures" or as text tables in the subfolder "tables". File "juon_autonomy_violence_appendix.pdf" collates all appendix tables and figures ("Figure A1", "Table A1", etc.); file "tablesA_full.pdf" collates full versions of results tables which are abbreviated in the appendix; file "tablesX.pdf" collates all results tables for models whose results are only depicted visually in the appendices (referred to as "Table X1", etc.); finally, file "juon_autonomy_violence_data_supplement.pdf" collates all supplementary tables and figures ("Figure S1", "Table S1", etc.).

Special note to users running the R script on Mac: If running the replication master file in RStudio on Mac, the long script may activate a well-known R software bug, related to the combination of standard and special characters in the ggplot axis labels. This may generate the error message "Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :polygon edge not found". If this bug occurs, it helps to restart RStudio, re-run step 1 ("Preliminaries"), reimport the data being worked on, and resume running the R scripts in the section where the error message first appeared.


*********************
2. REPLICATION DATA
*********************

All replication data at the various analysis levels can be found in the folder "replication_data". These data files are called by the R and Stata replication scripts. 

NOTE: The replication files contain all variables required for running the analyses at the respective levels, as called by the R and Stata scripts. Owing to their measurement level, some variables are included in some of these replication datasets, but omitted in others. For instance, controls for land-use ("agri_ih", "pasture_ih", "grass_ih") and for included elites of the largest group in a given grid cell ("gidl_g_included"), both of which are measured at the grid cell level, are included only in the (group/dyad) grid cell-level files, but not in the (group/dyad/directed dyad) administrative unit-level files. Conversely, the variable for total fatalities across communal violence incidents involving a dyad in a given administrative unit year ("tot_best_fat") is only included in the dyadic administrative unit year file, not in the others (e.g. replication datasets at the group/dyad grid cell year or group year level). 

NOTE 2: Files "group_grid_year.csv" and "dyad_grid_year.csv", required to run analyses at the group/dyad-grid cell year, are zipped, due to Dataverse's file size limits. These need to be unzipped before running analyses at these levels.

Overview on replication data files:
- "group_unit_year.csv": all variables at level of group-administrative unit-year (for main analyses and most robustness checks).
- "dyad_unit_year.csv": all variables at level of dyad-administrative unit-year (for main analyses and most robustness checks).
- "unit_year.csv": all variables at level of administrative unit-year (for robustness checks in appendix 3.6).
- "group_grid_year.csv": all variables at level of group-grid cell-year (for robustness checks in appendix 3.7 and instrumental variable analyses in appendix 4).
- "dyad_grid_year.csv": all variables at level of dyad-grid cell-year (for robustness checks in appendix 3.7 and instrumental variable analyses in appendix 4).
- "group_year.csv": all variables at level of group-year (for robustness checks in appendix 3.7).
- "dyad_year.csv": all variables at level of dyad-year (for robustness checks in appendix 3.7).
- "dyad_directed_unit_year.csv": all variables at level of directed dyad-administrative unit-year (for analyses of one-sided violence in appendix 5.2).
- "individual.csv": all variables at level of individual-(survey-)year (for analyses of group-wise grievances in appendix 5.3).
- "conflict_intervention.csv": all variables at level of conflict year (for analyses of state intervention in non-state conflicts in Africa, 1989-2010 in appendix 5.4; based on Elfversson 2015).
- "conflict_peace_processes.csv": all variables at level of conflict year (for analyses of mediation, negotiations, and agreements in non-state conflicts in Africa, 1989-2019; based on Duursma & Games forthcoming).


************************
3. EXPLANATORY DOCUMENTS
************************

Besides this readme file, this folder also contains two additional explanatory documents:
- The file "variable_dictionary.csv" contains a list of all variables in the replication data, along with explanations of what they measure and what their underlying sources are. The main independent and dependent variables are all generated in the data set-up script provided as part of these supplementary materials (see point 6 below). NOTE: The only variables not listed in this file are a number of dyadic variables that represent equivalents to the respective group-level variables. These always have the same name as their group-level equivalent, with the addition of suffix "1" or "2" to specify that they refer to dyad member 1 or 2, respectively. For instance, variable "int_grp_rel1" is the equivalent to "int_grp_rel" for dyad member 1, while variable "int_grp_rel2" is the equivalent to "int_grp_rel" for dyad member 2.
- The file "package_versions.csv" lists name and version of all R-packages required to replicate my analyses.


****************
4. ORIGINAL DATA
****************

The folder "original_data" contains all original data contributions of this article. Subdivided into three subfolders, it provides the following data:

- Administrative boundaries: This folder contains polygons for all administrative units covered by the new Significant Administrative Units Dataset (SAU), which is introduced in this article and used throughout its empirical analyses. The variables in this dataset are described in detail in the data supplement .pdf file (section S1.1). 
-- "sau1.0_fullres/sau1.0_fullres.shp" contains full-resolution polygons. 
-- "sau1.0_simplified_clean/sau1.0_simplified_clean_firstorderunits.shp" and "sau1.0_simplified_clean/sau1.0_simplified_clean_specialregion.shp" contain simplified and cleaned geometries of all polygons (first order units and special autonomous units, respectively). These are used to conduct the spatial intersections between group-wise settlement patterns and administrative boundaries. 

- Territorial autonomy: This folder contains three files, which together provide detailed information on the underlying institutional indicators used to construct my time-varying, administrative unit-level measure for territorial autonomy; variable names are equivalent to those used in the replication data (see file "variable_dictionary.csv"):
-- "sa_tiers_indicators.csv": contains autonomy tier-level information, coding comments, and constitutional articles underlying all autonomy indicators, except for financial guarantees (see data supplement, sections S2.1-S2.2);
-- "sa_country_level_candidate_financial_guarantees.csv": contains country-level coding comments and constitutional articles underlying the identification of candidate autonomous units and my financial guarantees indicator (see data supplement, sections S2.1-S2.2);
-- "sa_admin_unit_level_all_indicators.csv": contains all indicators measured at the administrative unit year-level, including financial guarantees (see data supplement, sections S2.1-S2.2); it also contains ethnic ID codes of institutionally-designated second-order majority group(s) (variables "ethnic_t_1" ... "ethnic_t_6") (see data supplement, section S1.3).

- Ethnically attributed violence: This folder contains two subfolders that provide information on ethnically attributed violence (see data supplement section S3):
-- NSV: The file "NonState_v20_1_ethnic_attribution.csv" provides information on the ethnic identities of actors in the UCDP Non State Violence Dataset (UCDP NSV, Sundberg, Eck & Kreutz 2012), Version 20.1, as described in the article's supplementary material. Variables "conflict_id" to "version" are as in the original UCDP NSV dataset; variables "side_a_gwgroupid1" to "side_a_gwgroupid6" denote the EPR group IDs involved on side a of the conflict (if any); variables "side_b_gwgroupid1" to "side_b_gwgroupid6" denote the EPR group IDs involved on side b of the conflict (if any). Group IDs that include an "X" denote distinct ethnic groups that are not judged as "politically relevant" by EPR and are hence not linked to the ethnic actors in the article's main analyses.
-- OSV: The file "EMM_osv_coding_edited.csv" provides information on the ethnic identities of perpetrators of violence in the UCDP One-Sided Violence Dataset (UCDP OSV, Eck & Hultman 2007), Version 20.1, as described in the article's supplementary material. Variables "location" to "is_government_actor" are as in the original UCDP OSV dataset; variables "gwgroupid1" to "gwgroupid6" denote the EPR group IDs involved among the perpetrators (if any). Group IDs that include an "X" denote ethnic groups that are not judged as "politically relevant" by EPR and are hence not linked to the ethnic actors in the article's analyses.


****************
5. EXTERNAL DATA
****************

The folder "external_data" contains all external data required for setting up the article's data structure and for constructing its main (independent and dependent) variables. These data are called by several data set up scripts (see below). These combine the article's original data and these external data to set up the analysis dataframes and variables. Subdivided into ten subfolders, this folder includes the following external data required for these purposes:

- app: This folder contains information from the Africa Peace Processes Dataset (APP, Duursma & Gamez forthcoming), used to set up the analyses in appendix 5.4.
-- "app_nonstate_negotiations.csv": all information from Excel tab "intra-state" in original APP dataset.
-- app_nonstate_violence.csv: all information from Excel tab "intra-state negotiations" in original APP dataset.

- CShapes_augmented: This folder contains "cshapes_augmented.shp", corresponding to the original CShapes 1.0 dataset (Weidmann et al. 2010) plus Tibet 1946-1950. This is used for identifying PRIO grid cells in each country year, necessary to set up the group/dyad grid cell year data structure used in robustness checks in appendix 3.7 and for the instrumental variable analyses in appendix 4.

- Elfversson2015_replication: This folder contains "replicationdata.dta", which is the replication data from Elfversson (2015), used to set up the analyses in appendix 5.4.

- EPR_augmented: This folder contains various data linked to the Ethnic Power Relations Dataset (Vogt et al. 2015). This is used to set up all group and dyad level data structures used throughout the article's empirical analyses. The data have been augmented by adding an "other" group with a statewide settlement pattern, whose size corresponds to 1-the sum of all coded groups' sizes and at least 0.005. Moreover, a small number of additional, constitutionally-recognised groups have been added to ensure congruence to the Constitutional Power-Sharing Dataset (Juon 2020). These additions are important, to ensure that the relative sizes of groups within each administrative unit are correctly calculated even in cases where EPR does not code the largest group(s) in a unit as "politically relevant" at the national level (see data supplement S1).
-- "ACD2EPR-2018.1.1.csv": ACD2EPR 2018.1.1 version, used for linking civil violence to EPR groups.
-- "ACD2EPR-2021.csv": ACD2EPR 2021 version, used for linking civil violence to EPR groups.
-- "epr_augmented_gperiod.csv": EPR 2018 period-wise version, used for group list and control variables.
-- "epr_augmented_gyear.csv": EPR 2018 year-wise version (variable set I), used for group list and control variables.
-- "epr_power_other.csv": EPR 2018 year-wise version (variable set II), used for group list and control variables.

- geoEPR_augmented: This folder contains "geoepr_augmented_simplified_clean.shp", an augmented and spatially simplified version of the GeoEPR dataset (Vogt et al. 2015). This is used predominantly for intersection administrative units and grid cells with group-wise settlement patterns, required to set up the data structure at the respective levels. Again, the data have been augmented by adding an "other" group with a statewide settlement pattern, whose size corresponds to 1-the sum of all coded groups' sizes and at least 0.005. Moreover, a small number of additional, constitutionally-recognised groups have been added to ensure congruence to the Constitutional Power-Sharing Dataset (Juon 2020). These additions are important, to ensure that the relative sizes of groups within each administrative unit are correctly calculated even in cases where EPR does not code the largest group(s) in a unit as "politically relevant" at the national level (see data supplement S1).

- hyde_population_density: This folder contains information on the time-variant local population density, based on the History Database of the Global Environment (HYDE version 3.2, Klein Goldewijk et al. 2017). For example, "2017AD_pop/popd_2017AD.asc" contains information for the year 2017.

- icow: This folder contains "coldata100.csv", which is the ICOW Colonial History Data Set, version 1.1 (Hensel 2018). This is used for defining the colonial subsample and constructing the colonial heritage variables used in the instrumental variables analyses (appendix 4).

- PRIO_Grid: This folder contains geographic information from the PRIO grid and variables preassembled and measured at this level (Tollefsen et al. 2012). This is used for identifying PRIO grid cells in each country year, necessary to set up the group/dyad grid cell year data structure for robustness checks in appendix 3.7 and the instrumental variable analyses in appendix 4. Moreover, it is also used for deriving several controls at the grid cell level, for instance land-use controls ("agri_ih", "pasture_ih", "grass_ih").
-- "PRIO-GRID Static Variables - 2022-01-19.csv": This file contains time-invariant data on each grid cell's total land area, required for delimiting the sample for robustness checks in appendix 3.7 and instrumental variable analyses in appendix 4.
-- "PRIO-GRID Yearly Variables for 1988-2014 - 2022-02-27.csv": Based on Tollefsen and colleagues (2012), this file contains data on each grid cell's land-use (agriculture, grass, pasture; originally from Meiyappan & Jain 2012), gross cell product (originally from Nordhaus 2006), and calibrated nightlights (originally from Elvidge et al. 2014). All of these variables are used for the instrumental variable robustness checks in appendix 4.2.
-- "PRIO-GRID Yearly Variables for 2010-2010 - 2022-01-19.csv" Based on Tollefsen and colleagues (2012), this file contains data on each grid cell's total population (originally from Center for International Earth Science Information Network 2005). This variable is used to define the sample in all grid cell level analyses (excluding grid cells without population).
-- "priogrid_cell.shp": This shapefile contains polygons and IDs for each PRIO grid cell (Tollefsen et al. 2012).

- survey_data - "surveydata.csv": This file contains individual level data on respondent ethnicity, grievances, and standardised demographic characteristics, collated from diverse global and regional surveys, used for analyses of group-wise grievances in appendix 5.3 (assembled by Juon 2023).

- ucdp: This folder contains various information from the Uppsala Conflict Data Program (UCDP), necessary to construct the dependent variables used across this article's analyses. This is structured into six subfolders:
-- actor_tables: This folder contains ID translation tables required to harmonise data from different versions of the UCDP dataset, at the levels of the conflict actor ("translate_actor.csv") and conflict dyad ("translate_dyad.csv"). Originally available from https://ucdp.uu.se/downloads/index.html#idtranslation.
-- nsv_issues: This folder contains information on the issues underlying communal conflicts in Africa (von Uexkull and Pettersson 2018), required for the robustness checks in appendix 3.7.5.
-- osv: This folder contains the UCDP One-sided violence dataset (Eck & Hultman 2007; Pettersson & Öberg 2020), required for the additional analyses in appendix 5.2.
-- osv_ethnic: This folder contains the Ethnic One-sided violence dataset (Fjelde et al. 2021), required for the additional analyses in appendix 5.2.
-- ucdp_dyadic: This folder contains the UCDP Dyadic Dataset, version 20.1 (Harbom, Melander & Wallensteen 2008; Pettersson & Öberg 2020), required to set up all analyses of civil and communal violence.
-- ucdp_ged: This folder contains the UCDP Georeferenced Event Dataset, version 20.1 (Sundberg & Melander 2013), required to set up all analyses of civil and communal violence.

List of sources for all external data:
Center for International Earth Science Information Network and Agricultura Tropical. 2005. “Gridded Population of the World Version 3.” 2005. http://sedac.ciesin.columbia.edu/gpw.
Duursma, Allard, and Samantha Gamez. 2022. “Introducing the African Peace Processes (APP) Dataset: Negotiations and Mediation in Interstate, Intrastate, and Non-State Conflicts in Africa.” Journal of Peace Research.
Eck, Kristine, and Lisa Hultman. 2007. “One-Sided Violence Against Civilians in War: Insights from New Fatality Data.” Journal of Peace Research 44 (2): 233–46. https://doi.org/10.1177/0022343307075124.
Elfversson, Emma. 2015. “Providing Security or Protecting Interests? Government Interventions in Violent Communal Conflicts in Africa.” Journal of Peace Research 52 (6): 791–805. https://doi.org/10.1177/0022343315597968.
Elvidge, Christopher D., Feng-Chi Hsu, Kimberly E. Baugh, Tilottama Ghosh, and Qihao Weng. 2014. “National Trends in Satellite Observed Lighting: 1992-2012.” Global Urban Monitoring and Assessment Through Earth Observation 23: 97–118.
Fjelde, Hanne, Lisa Hultman, Livia Schubiger, Lars-Erik Cederman, Simon Hug, and Margareta Sollenberg. 2021. “Introducing the Ethnic One-Sided Violence Dataset.” Conflict Management and Peace Science 38 (1): 109–26. https://doi.org/10.1177/0738894219863256.
Harbom, Lotta, Erik Melander, and Peter Wallensteen. 2008. “Dyadic Dimensions of Armed Conflict, 1946—2007.” Journal of Peace Research 45 (5): 697–710. https://doi.org/10.1177/0022343308094331.
Hensel, Paul R. 2018. “ICOW Colonial History Data Set, Version 1.1.” 2018. http://www.paulhensel.org/icowcol.html.
Juon, Andreas. 2020. “Minorities Overlooked: Group-Based Power-Sharing and the Exclusion-amid-Inclusion Dilemma.” International Political Science Review 41 (1): 89–107. https://doi.org/10.1177/0192512119859206.
———. 2023. “Inclusion, Recognition, and Inter-Group Comparisons: The Effects of Power-Sharing Institutions on Grievances.” Journal of Conflict Resolution 67 (9): 1783–1810. https://doi.org/10.1177/00220027231153583.
Klein Goldewijk, Kees, Arthur Beusen, Jonathan Doelman, and Elke Stehfest. 2017. “Anthropogenic Land Use Estimates for the Holocene – HYDE 3.2.” Earth System Science Data 9 (2): 927–53. https://doi.org/10.5194/essd-9-927-2017.
Meiyappan, Prasanth, and Atul K. Jain. 2012. “Three Distinct Global Estimates of Historical Land-Cover Change and Land-Use Conversions for over 200 Years.” Frontiers of Earth Science 6 (2): 122–39.
Nordhaus, William D. 2006. “Geography and Macroeconomics: New Data and New Findings.” Proceedings of the National Academcy of Sciences 103 (10): 3510–17.
Pettersson, Therése, and Magnus Öberg. 2020. “Organized Violence, 1989–2019.” Journal of Peace Research 57 (4): 597–613. https://doi.org/10.1177/0022343320934986.
Sundberg, Ralph, and Erik Melander. 2013. “Introducing the UCDP Georeferenced Event Dataset.” Journal of Peace Research 50 (4): 523–32. https://doi.org/10.1177/0022343313484347.
Tollefsen, Andreas Forø, Håvard Strand, and Halvard Buhaug. 2012. “PRIO-GRID: A Unified Spatial Data Structure.” Journal of Peace Research 49 (2): 363–74. https://doi.org/10.1177/0022343311431287.
Vogt, M., N.-C. Bormann, S. Rüegger, L.-E. Cederman, P. Hunziker, and L. Girardin. 2015. “Integrating Data on Ethnicity, Geography, and Conflict: The Ethnic Power Relations Data Set Family.” Journal of Conflict Resolution 59 (7): 1327–42. https://doi.org/10.1177/0022002715591215.
Weidmann, Nils B., Doreen Kuse, and Kristian Skrede Gleditsch. 2010. “The Geography of the International System: The CShapes Dataset.” International Interactions 36 (1): 86–106. https://doi.org/10.1080/03050620903554614.


****************
6. DATA SET-UP SCRIPT
****************

The folder "data_set_up_script" contains R scripts which call on the article's original data and on additional external data sources (see above) and which transform these datasets and variables to create the analysis dataframes and main dependent and independent variables used throughout the article's analyses. It is divided into three folders. NOTE: These three folders ("01_sau_ethnic_demographics", "02_components", "03_dataframes_variables") are zipped, due to Dataverse's file size limits. These need to be unzipped before running the data set-up script. NOTE2: Running the whole script can take a long time; on a MacBook with M1Max chip with 64GB memory it takes ~30 hours.

The data set-up script has the following structure (run in the same order to set up the data from start to finish):

- 01_sau_ethnic_demographics: This folder contains R scripts that intersect the group-wise ethnic settlement patterns provided by GeoEPR with regional administrative boundaries from the new Significant Administrative Units Dataset ("sau_geoepr_augmented_intersection.R") and with boundaries of all PRIO grid cells ("prio_geopr_augmented_intersection.R"). These scripts also produce outputs which contain ethnic demographics in each unit and grid cell, corresponding to several key control variables used throughout the article's analyses. These outputs are saved in .csv format ("sau_geoepr_augmented_intersection.csv" and "prio_geopr_augmented_intersection.csv", respectively).

- 02_components: This folder contains an R-script ("components.R") that constructs the diverse components which link the original and external data. The corresponding output is saved into the following subfolders, which contain the following information:
-- admin_unit_group_link: This folder contains information on the distances between each administrative unit coded by SAU and each ethnic group in EPR ("geoepr_sau_geo5_distances3_full.csv"), used to delimit the sample in all group/dyad unit year-level analyses as well as to identify core settlement areas (distance < 50km, but no spatial overlap, hence estimated population share = 0). It also contains the resulting sample of all groups within 50km of each unit ("sau_geo5_groups_y.csv"), dyads within 50km of each unit ("sau_geo5_groups_y_d.csv"), and directed dyads within 50km of each unit ("sau_geo5_groups_y_d_dir.csv").
-- admin_unit_main_level: This folder contains information on the main level at which administrative units in each country year should be selected (see data supplement S1.1.1). This serves to identify substantial autonomy arrangements at the level below the first-order admin unit (e.g. Kosovo within Yugoslavia) or above it (e.g. Mindanao in the Philippines).
-- admin_unit_violence_link: This folder contains information on the administrative unit contained in SAU in which each UCDP violence event of the respective type occurred: "ged_sau_geo5_int2.csv" (non-state violence), "ged_osv_sau_geo5_int2.csv" (one-sided violence), "ged_cw_sau_geo5_int2.csv" (state-based violence).
-- group_geo_violence_link: Ethnically linked, yearly violence events, split between all involved ethnic groups from EPR: "ged_attributed.csv" (communal violence) and "ged_osv_attributed.csv" (one-sided violence).
-- prio_admin_link: "gid_adm_link.csv" links all country PRIO grid cells to the administrative unit in SAU in which they are predominantly located.
-- prio_group_link: This folder contains information on the distances between each PRIO grid cell and each ethnic group ("geoepr_gid_y.csv"), used to delimit the sample in all group/dyad unit year-level analyses as well as core settlement areas. It also contains the resulting sample of all groups within 50km of each grid cell ("geoepr_gid_y.csv"), dyads within 50km of each grid cell ("geoepr_gid_y_d.csv"), and directed dyads within 50km of each grid cell ("geoepr_gid_y.csv").
-- prio_violence_link: This folder contains information on the PRIO grid cell in which each UCDP event of the respective type of violence occurred: "ged_gid.csv" (non-state violence), "ged_osv_gid.csv" (one-sided violence), "ged_cw_gid.csv" (state-based violence).
-- relevant_admin_units: This folder contains the subset of polygons selected from SAU, measured at December 31 of each year and only at the main level (first-order, subordinate, superordinate, special region) (see data supplement S1.1.1). All polygons are simplified to speed up the following geospatial intersections and analyses.
-- violence_spatial_lags: This folder contains information on the distances between all UCDP violence events in the same country, used to calculate spatial lags variables: "cv_distances3.csv" (non-state violence), "osv_distances3.csv" (one-sided violence), "cw_distances3.csv" (state-based violence).

- 03_dataframes_variables: Divided into 10 subfolders, this folder contains R-scripts that assemble the components calculated above, combine them with original data and external data, transform them into the main analysis dataframes, and construct the main dependent and independent variables used across the article's analyses. The subfolders also contain the output files from these transformation steps, which correspond directly to the replication data at the respective levels. NOTE: Only the main independent and dependent variables as well as key new control variables (e.g. population shares in each admin unit) are assembled in these files. Additional control variables not required to set up the data structure that are directly taken over from other external sources are not provided here (see replication data for corresponding values for each observation and main article/appendix for description of sources). NOTE2: Some output here is produced in two versions: one corresponding to the full sample ("_full") (all countries and all EPR and "other" groups in each year) and one version for a subset thereof ("_subset") (only countries and group years used in the article's analyses). The latter corresponds to the sample in the main analyses, while the former is used for additional variable transformations in later steps.
-- a_group_unit_year: Contains "group_unit_year.R", which assembles the dataframe and variables at level of group-administrative unit-year (for main analyses and several robustness checks).
-- b_dyad_unit_year: Contains "dyad_unit_year.R", which assembles the dataframe and variables at level of dyad-administrative unit-year (for main analyses and several robustness checks).
-- c_dyad_directed_unit_year: Contains "dyad_directed_unit_year.R", which assembles the dataframe and variables at level of directed dyad-administrative unit-year (for analyses of one-sided violence in appendix 5.2).
-- d_unit_year: Contains "unit_year.R", which assembles the dataframe and variables at level of administrative unit-year (for robustness checks in appendix 3.6).
-- e_group_grid_year: Contains "group_grid_year.R", which assembles the dataframe and variables at level of group-grid cell-year (for robustness checks in appendix 3.7 and instrumental variable analyses in appendix 4).
-- f_dyad_grid_year: Contains "dyad_grid_year.R", which assembles the dataframe and variables at level of dyad-grid cell-year (for robustness checks in appendix 3.7 and instrumental variable analyses in appendix 4).
-- g_group_year: Contains "group_year.R", which assembles the dataframe and variables at level of group-year (for robustness checks in appendix 3.7).
-- h_dyad_year: Contains "dyad_year.R", which assembles the dataframe and variables at level of dyad-year (for robustness checks in appendix 3.7).
-- i_conflict_year: Contains "conflict_year.R", which assembles the dataframe and variables at level of conflict year (for analyses of state intervention, mediation, negotiations, and agreements in non-state conflicts in Africa in appendix 5.4).
-- j_individual: Contains "individual.R", which assembles the dataframe and variables at level of individual-(survey-)year (for analyses of group-wise grievances in appendix 5.3).


******************
7. GENERAL REMARKS
******************

For questions or comments, do not hesitate to contact Andreas Juon (andreas.juon@icr.gess.ethz.ch).

This repository is available through the APSR Dataverse and, alternatively, via Andreas Juon's Dataverse (https://dataverse.harvard.edu/dataverse/andreas_juon/).